Automatic Treebank Conversion via Informed Decoding
نویسندگان
چکیده
In this paper, we focus on the challenge of automatically converting a constituency treebank (source treebank) to fit the standard of another constituency treebank (target treebank). We formalize the conversion problem as an informed decoding procedure: information from original annotations in a source treebank is incorporated into the decoding phase of a parser trained on a target treebank during the parser assigning parse trees to sentences in the source treebank. Experiments on two Chinese treebanks show significant improvements in conversion accuracy over baseline systems, especially when training data used for building the parser is small in size.
منابع مشابه
ارائۀ راهکاری قاعدهمند جهت تبدیل خودکار درخت تجزیۀ نحوی وابستگی به درخت تجزیۀ نحوی ساختسازهای برای زبان فارسی
In this paper, an automatic method in converting a dependency parse tree into an equivalent phrase structure one, is introduced for the Persian language. In first step, a rule-based algorithm was designed. Then, Persian specific dependency-to-phrase structure conversion rules merged to the algorithm. Subsequently, the Persian dependency treebank with about 30,000 sentences was used as an input ...
متن کاملBetter Automatic Treebank Conversion Using A Feature-Based Approach
For the task of automatic treebank conversion, this paper presents a feature-based approach which encodes bracketing structures in a treebank into features to guide the conversion of this treebank to a different standard. Experiments on two Chinese treebanks show that our approach improves conversion accuracy by 1.31% over a strong baseline.
متن کاملتبدیل خودکار درختبانک وابستگی فارسی به درختبانک سازهای
There are two major types of treebanks: dependency-based and constituency-based. Both of them have applications in natural language processing and computational linguistics. Several dependency treebanks have been developed for Persian. However, there is no available big size constituency treebank for this language. In this paper, we aim to propose an algorithm for automatic conversion of a depe...
متن کاملAn Automatic Treebank Conversion Algorithm for Corpus Sharing
An automatic treebank conversion method is proposed in this paper to convert a treebank into another treebank. A new treebank associated with a different grammar can be generated automatically from the old one such that the information in the original treebank can be transformed to the new one and be shared among different research communities. The simple algorithm achieves conversion accuracy ...
متن کاملConverting the TüBa-D/Z Treebank of German to Universal Dependencies
This paper describes the conversion of TüBa-D/Z, one of the major German constituency treebanks, to Universal Dependencies. Besides the automatic conversion process, we describe manual annotation of a small part of the treebank based on the UD annotation scheme for the purposes of evaluating the automatic conversion. The automatic conversion shows fairly high agreement with the manual annotations.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010